341 research outputs found

    CAN-DO VS. WILL-DO FACTORS: PREDICTING THE GOLD-STANDARD MARINE

    Get PDF
    The Marine Corps has historically used the high school diploma and Armed Services Vocational Aptitude Battery scores to define a high-quality enlisted Marine. This industrial-era approach fails to consider the enlistee holistically, despite evidence that a combination of cognitive and non-cognitive assessments paints a more complete picture of an enlistee. In addition to utilizing outdated recruitment methods, the current manpower system fails to identify where a particular Marine falls on a range of skills, with the extremes being generalist and specialist. Using factor analysis, machine learning, and multivariate logistic regression, this research utilizes existing personnel data to develop proxy variables that support Marine Corps efforts to better predict which enlistees will be gold-standard Marines, as well as predicting whether an enlisted Marine is a generalist or specialist. Given that proxy variables are generated to replace data that is provided by the Tailored Adaptive Personality Assessment System (TAPAS), the Marine Corps should validate the predictive accuracy of these models using TAPAS data once it is available. The bottom line is that this research provides evidence that the current manpower and recruiting systems can be refined to support more accurate decision making that will enable the Marine Corps to achieve future manpower and operating environment requirements.Major, United States Marine CorpsApproved for public release. Distribution is unlimited

    Computational Methods for Inferring Transcriptome Dynamics

    Get PDF
    The sequencing of the human genome paved the way for a new type of medicine, in which a molecular-level, cell-by-cell understanding of the genomic control system informs diagnosis and treatment. A key experimental approach for achieving such understanding is measuring gene expression dynamics across a range of cell types and biological conditions. The raw outputs of these experiments are millions of short DNA sequences, and computational methods are required to draw scientific conclusions from such experimental data. In this dissertation, I present computational methods to address some of the challenges involved in inferring dynamic transcriptome changes. My work focuses two types of challenges: (1) discovering important biological variation within a population of single cells and (2) robustly extracting information from sequencing reads. Three of the methods are designed to identify biologically relevant differences among a heterogenous mixture of cells. SingleSplice uses a statistical model to detect true biological variation in alternative splicing within a population of single cells. SLICER elucidates transcriptome changes during a sequential biological process by positing the process as a nonlinear manifold embedded in high-dimensional gene expression space. MATCHER uses manifold alignment to infer what multiple types of single cell measurements obtained from different individual cells would look like if they were performed simultaneously on the same cell. These methods gave insight into several important biological systems, including embryonic stem cells and cardiac fibroblasts undergoing reprogramming. To enable study of the pseudogene ceRNA effect, I developed a computational method for robustly computing pseudogene expression levels in the presence of high sequence similarity that confounds sequencing read alignment. AppEnD, an algorithm for detecting untemplated additions, allowed the study of transcript modifications during RNA degradation.Doctor of Philosoph

    Robust detection of alternative splicing in a population of single cells

    Get PDF
    Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3′ bias and technical noise. These unique properties of single cell RNA-seq data make study of alternative splicing difficult, and thus most single cell studies have restricted analysis of transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model to detect genes whose isoform usage shows biological variation significantly exceeding technical noise in a population of single cells. Importantly, SingleSplice is tailored to the unique demands of single cell analysis, detecting isoform usage differences without attempting to infer expression levels for full-length transcripts. Using data from spike-in transcripts, we found that our approach detects variation in isoform usage among single cells with high sensitivity and specificity. We also applied SingleSplice to data from mouse embryonic stem cells and discovered a set of genes that show significant biological variation in isoform usage across the set of cells. A subset of these isoform differences are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle

    SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data

    Get PDF
    Accuracy of trajectory reconstruction using a subset of cells. (a) Graph showing how similar the SLICER trajectory is when computed using a random subset of lung cells. The blue bars show the similarity in cell ordering (units are percent sorted with respect to the trajectory constructed from all cells). The orange bars show the similarity in branch assignments (percentage of cells assigned to the same branch as the trajectory constructed from all cells). The values shown were obtained by averaging the results from five subsampled datasets for each percentage (80 %, 60 %, 40 %, and 20 %). (b) Order preservation and branch identity values computed as in panel (a), but for datasets sampled from the neural stem cell dataset. (PDF 106 kb

    TUT7 catalyzes the uridylation of the 3′ end for rapid degradation of histone mRNA

    Get PDF
    The replication-dependent histone mRNAs end in a stem–loop instead of the poly(A) tail present at the 3′ end of all other cellular mRNAs. Following processing, the 3′ end of histone mRNAs is trimmed to 3 nucleotides (nt) after the stem–loop, and this length is maintained by addition of nontemplated uridines if the mRNA is further trimmed by 3′hExo. These mRNAs are tightly cell-cycle regulated, and a critical regulatory step is rapid degradation of the histone mRNAs when DNA replication is inhibited. An initial step in histone mRNA degradation is digestion 2–4 nt into the stem by 3′hExo and uridylation of this intermediate. The mRNA is then subsequently degraded by the exosome, with stalled intermediates being uridylated. The enzyme(s) responsible for oligouridylation of histone mRNAs have not been definitively identified. Using high-throughput sequencing of histone mRNAs and degradation intermediates, we find that knockdown of TUT7 reduces both the uridylation at the 3′ end as well as uridylation of the major degradation intermediate in the stem. In contrast, knockdown of TUT4 did not alter the uridylation pattern at the 3′ end and had a small effect on uridylation in the stem–loop during histone mRNA degradation. Knockdown of 3′hExo also altered the uridylation of histone mRNAs, suggesting that TUT7 and 3′hExo function together in trimming and uridylating histone mRNAs

    Characterization and reduction of line-to-line crosstalk on printed circuit boards

    Get PDF
    Master of ScienceDepartment of Electrical and Computer EngineeringWilliam B. KuhnAn important concern for high speed circuit designs is that of crosstalk and electromagnetic interference. In PCB board-level designs, crosstalk at microwave frequencies may result from imperfections in shielding of PCB interconnects or more generally transmission lines. Several studies have been done to characterize and improve the isolation between PCB transmission lines for both digital and RF circuits. For example, previous studies in the microwave region have examined the effect that line type, line length, and separation have on crosstalk and suggest that without full shielding, the upper limit of isolation is on the order of 60dB for traditional board-level lines [1]. In order to more fully characterize crosstalk and improve isolation above 60 dB, this thesis studies signal-to-ground-plane separation, considers advanced line types, and examines the effect of 3D shielding. Results are presented from 100MHz to 30GHz for the traditional transmission line structures of microstrip, CPW, differential pair and CPW differential pair. This study shows that with a halving of distance between signal and ground planes, isolation between transmission lines can be reduced by as much as 20dB, making this one of the best ways to improve performance. Advanced methods of shielding are then presented. Direct launch stripline and single-sided CPW improve upon existing crosstalk reduction techniques, while split shielding and ablation of dielectric PCB material are also proposed. The data and additional crosstalk reduction techniques discussed in this thesis serve two purposes. One: with a more complete understanding of the effects that transmission line types and parameters have on crosstalk, engineers can quickly identify potential crosstalk issues and resolve them before manufacturing. Second, this thesis presents the engineer with four new additional techniques that may become available in advanced manufacturing environments. Such techniques can further reduce crosstalk and may allow for isolation values to approach 100 dB at the PC board level

    Lessons Learned from Deploying an Analytical Task Management Database

    Get PDF
    Defining requirements, missions, technologies, and concepts for space exploration involves multiple levels of organizations, teams of people with complementary skills, and analytical models and simulations. Analytical activities range from filling a To-Be-Determined (TBD) in a requirement to creating animations and simulations of exploration missions. In a program as large as returning to the Moon, there are hundreds of simultaneous analysis activities. A way to manage and integrate efforts of this magnitude is to deploy a centralized database that provides the capability to define tasks, identify resources, describe products, schedule deliveries, and generate a variety of reports. This paper describes a web-accessible task management system and explains the lessons learned during the development and deployment of the database. Through the database, managers and team leaders can define tasks, establish review schedules, assign teams, link tasks to specific requirements, identify products, and link the task data records to external repositories that contain the products. Data filters and spreadsheet export utilities provide a powerful capability to create custom reports. Import utilities provide a means to populate the database from previously filled form files. Within a four month period, a small team analyzed requirements, developed a prototype, conducted multiple system demonstrations, and deployed a working system supporting hundreds of users across the aeros pace community. Open-source technologies and agile software development techniques, applied by a skilled team enabled this impressive achievement. Topics in the paper cover the web application technologies, agile software development, an overview of the system's functions and features, dealing with increasing scope, and deploying new versions of the system

    The word landscape of the non-coding segments of the Arabidopsis thaliana genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression.</p> <p>Results</p> <p>Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of <it>Arabidopsis thaliana</it>. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others.</p> <p>Conclusion</p> <p>Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the <it>Arabidopsis </it>genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the <it>Arabidopsis </it>genome.</p
    • …
    corecore